class: center, middle, inverse, title-slide # Lecture 11 ## Factorial Designs ### Psych 10 C ### University of California, Irvine ### 04/22/2022 --- ## Factorial designs - So far, we have talked about comparisons between groups defined by the values of a single variable. -- - We had comparisons between two groups when our independent variable could take two values and participants where assigned at random to either of those values. -- - We also had comparisons when we had two measures of the same participant, these designs are known as paired samples and our objective was to test the differences between the two measures. -- - A third problem was when we had multiple values for an independent variable, for example, when we have students that belong to multiple cohorts or when we have multiple conditions in an experiment. -- - One important thing to note is that, even when our independent variable can take more than two values we are still talking about a single independent variable. --- ## Factorial designs - In contrast to the methods that we have talked about in class, there are problems that will require us to look at the effects of more than one variable at a time. -- - For example, when we test a new drug designed as a treatment to reduce blood pressure, it might be important to consider if the participant is male or female. -- - The key point is that sometimes in our experiments we would like to consider effects of more than one variable at a time, and how those effects might interact to give rise to our observations. -- - When we have more than one independent variable in an experiment which can take 2 or more categorical values (like smoking status, cohort, treatment, etc.) we call the experiment a **Factorial design**. --- ## Factorial designs - Whenever we have two or more independent variables that can take 2 or more categorical values we refer to them as **factors**. For example: -- - In our study about the relation between smoking status and lung capacity we can refer to smoking status as: "the factor smoking". -- - In our study about the effects of a new drug on blood pressure we can refer to the treatment condition as: "the factor treatment". -- - In order to describe our experimental designs we use a type of notation designed specifically for these kind of problems. --- ## Factorial designs - For example, we are interested in comparing the performance of participants in a recognition memory task versus a free recall task using high frequency words and low frequency words. -- - In this example, we have two factors, the first one is the "task" factor, the second one is the word frequency factor. -- - Furthermore, each factor can take two different values, the task factor can be recognition or free recall and the word frequency factor can be high or low frequency. -- - Then we would call this experimental design a `\(2\times 2\)` **factorial design**, where the first number refers to the number of levels (values) of the first factor and the second refers to the levels (values) of the second factor. --- ## Factorial designs and participant assignment - If participants are assigned to a single combination of the values of the factors, for example, one group of participants only responds to the free recall tests with high frequency words and a different group responds to the free recall test with low frequency words, and so on. We refer to the design as a `\(2\times 2\)` **between subjects factorial design**. -- - If all participants respond to **all** combinations of the factors, we refer to it as a `\(2\times 2\)` **within subjects factorial design**. -- - This is very similar to what we had in the case of a single independent variable. -- - A between subjects design is when different groups of participants are assigned to different values of the independent variable. -- - A within subjects design was when participants responded to both levels of our independent variable (like the before and after training problem in HW 2). --- ## Factorial designs and participant assignment. - Now that we have more than one independent variable, we can have a new experimental design, **Mixed designs**. -- - A **Mixed design** refers to an experiment where one of our factors (independent variables) is treated as between subjects and another is treated as within subjects. -- - For example, if we have the same group of participants responding to our free recall test with both low and high frequency words and we have another group of participants responding to the recognition memory task with both low and high frequency words. -- - We will call this a `\(2\times2\)` **mixed factorial design**. --- ## Mixed designs - Whenever we have a mixed design, we have to specify which factors are treated as between and which factors are treated as within subjects. -- - From our previous example, an accurate description of the design would be: -- - This is a `\(2\times2\)` **mixed factorial design** where the test factor was manipulated between subjects (different groups do different tasks) and the word frequency factor was manipulated within subjects (every participant responded to a task with high frequency words and another task with low frequency words). -- - An important thing to notice is that each type of design will have a different number of independent groups, and that the number of levels that each group would look at during the experiment will also be different. --- ## Factorial designs - Using our memory experiment with 2 levels of each factor (independent variables). -- - If we had a `\(2\times2\)` **between subjects factorial design**, that means that we will have 4 groups and that each group will look at a single combination of the levels of the factors (*e.g.* free recall - high frequency, free recall - low frequency, recognition - high frequency, recognition - low frequency). -- - If we had a `\(2\times2\)` **within subjects factorial design**, that means that we have a single group and that all participants in that group look at every combination of the levels of our two factors. -- - Finally, if we had a `\(2\times2\)` **mixed factorial design** with task manipulated between subjects and word frequency manipulated within subjects. We would have 2 groups (one performing a free recall tasks and the other a recognition tasks) and each group would see two values of our second factor (each group would do the task with low and high frequency words independently). --- ## Factorial designs We have a `\(3\times 2\)` between subjects design, .can-edit.key-likes[ How many factors do we have in the experiment? - **ANS:** How many levels does each factor have? - **ANS:** How many independent groups do we have on the experiment? - **ANS:** How many combinations of the factors levels does each group see? - **ANS:** ] --- ## Factorial designs We have a `\(2\times 2\times 3\)` within subjects design, .can-edit.key-likes[ How many factors do we have in the experiment? - **ANS:** How many levels does each factor have? - **ANS:** How many independent groups do we have on the experiment? - **ANS:** How many combinations of the factors levels does each group see? - **ANS:** ] --- ## Factorial designs We have a `\(4\times 2\)` mixed design, with the first factor manipulated between subjects and the second factor manipulated within subjects: .can-edit.key-likes[ How many levels does each factor have? - **ANS:** How many independent groups do we have on the experiment? - **ANS:** How many combinations of the factors levels does each group see? - **ANS:** ] --- ## Between subjects factorial designs - Within subjects and mixed designs require more "sophisticated" methods to be analyzed. Therefore, we will only cover between subjects factorial designs this quarter. -- - To make it easier for us to talk about factorial designs, we have to introduce some new notation. First because we now have more than one independent variable and second because we would like to study the effect of each variable or factor independently. -- - We will now denote our observations as `\(y_{ijk}\)`, note that we have only added a subscript to our observations. -- - `\(j\)` denotes the level of our first factor (independent variable). -- - `\(k\)` denotes the level of our second factor (independent variable). -- - `\(i\)` denotes the observation number. --- ## Example between subjects factorial design. - We measure the levels of anxiety of 30 students at the end of their first year from two cohorts 2019 and 2020, and we record whether they took a statistics course during their first year or not. -- - We denote the cohort as `\(j=1,2\)` where `\(1\)` represents students in the 2019 cohort and `\(2\)` represents students in the 2020 cohort. -- - We denote with `\(k=1\)` students that did not take a statistics class during their first year and with `\(k=2\)` students that did. Then: -- - `\(y_{2,1,1}\)` would represent the anxiety level at the end of the first year of the second student in the 2019 cohort that didn't took a statistics course in their first year. - `\(y_{9,2,1}\)` would represent the anxiety level at the end of the first year of the ninth student in the 2020 cohort that didn't took a statistics course in their first year. - `\(y_{1,2,2}\)` would represent the anxiety level at the end of the first year of the first student in the 2020 cohort that took a statistics course in their first year. --- ## Models for factorial designs - When we had one independent variable that could take more than one value we faced the problem of comparing multiple models in order to answer our research question. -- - When we have to deal with factorial designs we will have a similar problem. However, in factorial designs we will typically be interested in what we call **main effects**, **additive effects** and **interactions**. -- - A **main effect** refers to how the expected value of our dependent variable changes given the levels of a single factor (independent variable). -- - If we have two factors (independent variables in our design) we will have two main effects models (one for each factor). -- - In our anxiety example, one model would be the "**main effects of cohort**" where we assume that only the cohort that a student belongs too, contributes to the difference between anxiety levels (regardless of whether they took a statistics class or not). -- - The second model would be a "**main effects of statistics class**", where we assume that only taking a statistics class during their first year contributes to any differences between the anxiety levels of participants (regardless of cohort). --- ## Models for factorial designs - Main effects models are similar to what we have studied before. However, in factorial designs we will have a new class of models known as **additive models**. -- - An **additive effects** model formalizes the assumption that the effects of each of our factors are independent, in other words that the value of one factor will not change the effect that second factor has on the expected value of our dependent variable. Nevertheless, both factors have an effect on the dependent variable. -- - If we only have 2 factors in our design we will only have one **additive model**. This model will be a combination of the two main effects models. -- - In our anxiety example, the **additive model** assumes that cohort and whether the student took a statistics class during their first year have an effect on the average anxiety levels of students, however, the effect of cohort and the class are independent, so they are calculated separately and then added together to make a prediction. -- - That's where the name **additive** comes from, we sum each independent effect of our factors in order to make a prediction about the anxiety levels of each group. --- ## Models for factorial designs - From the models that we need to compare in factorial designs the most complicated one is the **interaction**. -- - An **interaction** model formalizes the assumption that the effect that one of our factors has on the expected value of our dependent variable depends on the value of the second factor. -- - **Interaction** models are easy to implement (they are very similar to the Effects models that we have used to this point), however, thy are hard to interpret. -- - In our anxiety example, an **interaction** model would assume that the mean of each group is independent and that they are not just the sum of the main effects. In other words, each combination of cohort and whether a student took a statistics class would have a different expected anxiety level that might not related to the other groups. -- - An example of an interaction would be when only students in the 2020 cohort that took a statistics class during their first year have a different anxiety level compared to the rest of the students in the study (in other words the effect of taking a stats class is different depending on the cohort the student belongs to!). --- ## Models for factorial designs - As with the rest of the models in the class, the key will be the predictions that the model makes. The difference between the models is just how we can calculate those predictions. -- - Once we have the predictions of each of our models, then everything else will follow as before. -- - In other words, the steps that we need to carry out with any model for a factorial design will be: -- 1. Find the predictions of the corresponding model for each combination of the levels of the factors `\(\hat{\mu}_{jk}\)`. -- 2. Calculate the squared difference between each observation and the prediction that the model makes for it `\((y_{ijk}-\hat{\mu}_{jk})^2\)` -- 3. Add all those squared differences to get the Sum of Squared Error of the model (SSE). -- 4. Use the SSE to compute the Mean Squared Error (mse), the proportion of variance accounted for by the model `\((R^2)\)` and the BIC. --- ## Models for factorial designs - Next week we will define each of the models that we need when we have a `\(2\times 2\)` between subjects factorial design. -- - We will need to make some changes to how we express our models, which means that we will arrive to our predictions using a different approach, however, this new approach will also use averages. -- - Note that if you remember the steps that we have taken to solve the previous problems that we have seen in class, then this more complicated experimental design will be easier to follow. Given that the steps we need to take are the same. -- - This is because all of these methods belong to the same family known as linear models.